Analyzing and Visualizing WWII Aerial Bombing Data

By Justin McKinney and Justin Ashbaugh

what image shows

Introduction

World war II has been sealed in the history books as one of the most catastrophic, devastation causing events in all of history. Resulting in billions of dollars worth of damages and casualties in the millions, there was no shortage of pure carnage to be seen. Though these numbers are clear and present, it is sometimes important to ponder what it takes for sheer destruction of this magnitude. The data set we analyze here houses a plethora of knowledge and data regarding the destructive capabilities of aircraft during the course of WWII. It houses many important data points such as Aircraft type, Bomb type, target, bomb size, and so much more. It is important to note however, that this data only represents targets and missions conducted by the allied forces, without the inclusion of Russia. This data is not representative of the axis powers contribution to the world's destruction.

Outline

In this tutorial, our goal is to provide you with a more colloquial and readable understanding of the historical data collected during the war. The data provided in its raw form is incredibly clunky, laden with missing values and column after column of confusing information. Our tutorial works to tidy this data, as well as to manipulate it and frame it for better understanding and investigation. We hope that after reading our tutorial you walk away with a better understanding of the destruction of WWII, and the sheer amount of bombs that were dropped as a result of the multiple theatres of war. Perhaps with this knowledge you can hold an engaging conversation with your friends, state a fun fact or two, or even write a paper about the tragedy and horror of human ingenuity. If you already knew all of this information, we hope you enjoyed our visual representations, and a short refresher course in destructive history. If this is new to you, rejoice in the fact that we live in a world where this war has ended, and that you need not fear the looming threat of a B17 bomber over head.

A few things to note

Tidy the data

Lets drop some of those unecessary columns that we won't be using in order to make the data a bit more readable.

Step 2: Analysis

Now that we have loaded the data lets do some basic analysis! Lets see how many tons of TNT worth of bombs the allies dropped over the corse of the war. We can do this easily by specifying the column containing the total tons of weaponry dropped and calling sum on that column.

That is a pretty isane amount of weaponry dropped over the corse of WW2. Now lets try to recreate that graph on the website we got the data from that showed which countries had the most tons of weaponry dropped on them.

Wow! That is a lot of bombs dropped on Germany. Now lets now see who was dropping those bombs. We can do a similar strategey as the last graph except this time we will group by the country flying the mission instead of the target country.

The USA and Great Britan unsuprisingly did the most dropping of bombs. Now lets find which planes did most of this bomb dropping.

The B17 leads this category having dropped almost 30% of all bombs dropped by the US and Great Britan over the war.

B17

This is impressive but not unexpected as the B17 was one of the most mass produced and effictive bombers of the war. Britanica states that the B17 was "was the mainstay of the strategic bombing campaign" for the US.

Now lets try to get a nice overview of the amount of bombing that occured over time.

Hmm what is that small blip we see right before 1941? That seems to be a lot of bombing for so early in the war.

It turns out this occured in Africa and was related to Northern front, East Africa, 1940. Our data seems to correlate with the wikipedia article under the section about the british attack on fort Gallabat. Our data shows 6 WELLESLEY bombers attacking a fort with around 5000 tons of TNT worth of bombs. Wikipedia states "An RAF contingent of six Wellesley bombers and nine Gladiator fighters were thought sufficient to overcome the 17 Italian fighters and 32 bombers believed to be in range. The infantry assembled 1–2 mi (2–3 km) from Gallabat, whose garrison was unaware that an attack was coming, until the RAF bombed the fort and put the wireless out of action.". An intresting little discovery.

This next section of code works with a slightly more advanced understanding of plotting and dataframe manipulation in pandas. The goal of this cell is to identify a list of all aircraft used during WWII, creating and labeling a scatter plot which will be used to display the amount of TNT (in tons) dropped by each individual aircraft type, over the course of the war. We utilize a clever subplot stacking trick to put multiple plots on the same single plane(pun intended). We then loop through our list of planes, for each plane isolating our dataframe to only contain those rows that relate to the current plane, and for these rows plotting their data points on our graph.

Upon first view of the above graph it might seem like it was done incorrectly, and the code does not work the way it is supposed to. This was our initial thought when seeing the clustered results followed by two major outliers far above any other data points. We decided to pull up those particular data points from our dataframe to see what was going on, as these data points made no sense. The following code is what we used to learn more about our very large outliers and determine what our issue could be.

After successfully isolating the two outliers we were able to determine that there was in fact no bug in our code, but in fact an anomoly regarding the two and only atomic bombs ever used on a civilization in all of history. We were surpsied to have forgotten these two events, though they imediately made the data clear and understandable. The events, as listed in the dataframe correlate to the bombings of Hiroshima) and Nagasaki by the united states, in August of 1945, using the catastrophic destruction of the atomic bomb. Bombs Fat man, and Little Boy were dropped, with Fat man being the larger of the two. The Bombs can be seen Below. (Little Boy seen first, Fat Man seen second). Next to them is the b29, the bomber that carried them.

Fat Man Little Boy b29

Step 3: More in depth visualization

Now lets bring in folium to do some visualizing of what areas were bombed the most. Folium will let us create an interactive heatmap in order to see what areas were most bombed by the allies throughout the war. We are going to do a frequency heatmap overlayed with circle markers that denote the most instensive explosions (top 10,000). These circles will indicat, not to scale, the amount of damage caused in relation to other strikes. We also will label a few key cities to guide your interpretation.

Step 4: A bit of machine learning (because everything needs machine learning)

Let's say hypothetically you are a person living your life during WW2. We are going to train a machine learning model to predict which type of aircraft would be most likely to drop a bomb on your head. This is obviously very tounge in cheek but an instresting way to learn about some basic machine learning concepts.

Details

We are going to train a Decision Tree model in order to predict aircraft type based on a given longitude and latitude. We will split the data into a training set and a testing set, then train the model on the training set. After that we can use the testing set to test how accurate our model is. This is called holdout validation and is a common technique in machine learning.

Obviously this is not a intelligent apllication of machine learning but it does seem to produce not wildly incorrect results with an accuracy score of 0.58 out of 1. Obviosuly trying to predict what aircraft would bomb a specific location is an impossible task but it is a fun to see what the model guesses for any longitude and latitude you throw at it.

Lets try a few tests:

Some of these predictions do seem reasonable. Tokyo for example is logical as the US did not bomb Japan until late into the war and the B29 came into service much later in WW2. Brussels also makes some sense because B17's were mostly deployed over Europe. Not all make complete sense but it is intresting to see how the inputs impact the prediction.